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Abstract 

For lignocellulosic bioenergy to become a viable alternative to traditional energy production methods, rapid 
increases in conversion efficiency and biomass yield must be achieved. Increased productivity in bioenergy 
production can be achieved through concomitant gains in processing efficiency as well as genetic improvement of 
feedstock that have the potential for bioenergy production at an industrial scale. The purpose of this review is to 
explore the genetic and genomic resource landscape for the improvement of a specific bioenergy feedstock group, 
the C4 bioenergy grasses. First, bioenergy grass feedstock traits relevant to biochemical conversion are examined. 
Then we outline genetic resources available bioenergy grasses for mapping bioenergy traits to DNA markers and 
genes. This is followed by a discussion of genomic tools and how they can be applied to understanding bioenergy 
grass feedstock trait genetic mechanisms leading to further improvement opportunities. 
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Introduction 

Paleobioenergy obtained from coal, natural gas and oil 
deposits has allowed mankind to implement unprece¬ 
dented technological advances in the last 250 years. 
Clearly, fossil fuels will not go away any time soon, but 
they are a finite resource with a viable lifespan affected by 
rapid population expansion (7 billion+; [1]) and the threat 
of the further elevation of greenhouse gases on our ability 
to respond to unpredictable variations in climate [2,3]. 
While the urgency for renewable energy sources to sup¬ 
plant fossil fuels on a massive scale is debatable, the need 
for alternative energy sources is evident. Bioenergy 
obtained from renewable plant material is an excellent 
component to any alternative energy portfolio. 

Bioenergy feedstock selection is dependent upon many 
economic factors including land use constraints [4] and 
impact on other non-energy commodities [5], both of 
which could be addressed through public policy. Other 
feedstock factors can be addressed via rational existing 
feedstock selection as well as improvement through plant 
breeding and genetic modification. These factors include 
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energy density [6] and yield, cultivation costs [6], transpor¬ 
tation logistics [7], pre-processing requirements [7], and 
conversion process [8]. For example, the scale-up of fer¬ 
mentable corn biomass (grain) to ethanol production (1 st 
generation biofuel) in the U.S. in recent years has been 
successful since the conversion technology and agricul¬ 
tural infrastructure has matured [9]. Similarly, decades of 
sugarcane production in Brazil made it possible to become 
a net energy exporting economy [10]. Conversely, the 
promise of converting biomass that is recalcitrant to fer¬ 
mentation (lignocellulose) into viable energy products (2 nd 
generation biofuels) has yet to be realized primarily due to 
the lack of realistic conversion techniques [11]. Thus, 
there is no turn-key bioenergy lignocellulosic feedstock 
solution at this time, but extensive research into efficient 
conversion process engineering and favorable feedstock 
properties is well under way. 

The purpose of this review is to explore the genetic and 
genomic resource landscape for the improvement a spe¬ 
cific bioenergy feedstock group, the bioenergy grasses. We 
define bioenergy grasses as members of the grass family 
(Poaceae) that employ C4 photosynthesis and are capable 
of producing high biomass yield in the form of lignocellu¬ 
lose, fermentable juice, or fermentable grain [12]. Given 
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their proven utility as feedstock in academic and industrial 
interests, we focus on resources available for five specific 
bioenergy grasses: Zea mays (maize), Saccharum spp. 
(sugarcane), Sorghum bicolor (sorghum), Miscanthus spp. 
(Miscanthus), and Panicum virgatum (switchgrass). First, 
we discuss which grass feedstock traits are relevant to 
bioenergy production with a focus on biochemical conver¬ 
sion. Next, we discuss genetic resources available for the 
five bioenergy grasses to map bioenergy traits to genes. 
Then, we discuss genomic tools and how they can be ap¬ 
plied to understanding bioenergy grass feedstock trait gen¬ 
etic mechanisms leading to further improvement 
opportunities. Finally, we will make the case for how mod¬ 
ern genetic, genomic, and systems biology approaches can 
be coupled with bioprocessing constraints (industrial phe¬ 
notypes) to breed feedstock varieties tailored to an indus¬ 
trial application. 

Relevant bioenergy grass traits 

There are many extant bioenergy grass feedstock var¬ 
ieties (genotypes), which are sufficient for select conver¬ 
sion processes. For example, specific maize and 
sugarcane genotypes have been successful bioenergy 
grass feedstocks since high-yielding genotypes (grain and 
juice, respectively) have been grown at large scale for 
decades, and the conversion process (yeast fermentation) 
is well understood at the industrial level. Recent atten¬ 
tion has been given to the more difficult problem of 2 nd 
generation lignocellulose biomass conversion into profit¬ 
able bioenergy products, which has the potential for 
accessing the photosynthate locked into the plant cell 
wall for conversion into useful products. Clearly, 2 nd 
generation genotypes that produce high dry weight 
yields are of paramount importance, which is the oppos¬ 
ite direction of the Green Revolution which led to small 
plants with high grain yield [13]. However, the identifica¬ 
tion and improvement of bioenergy grass genotypes with 
high biomass that efficiently respond to a given conver¬ 
sion process is ideal. 

While there is much potential for bioenergy grasses as 
feedstock into thermal conversion processes (e.g. combus¬ 
tion, torrefaction, pyrolysis, and gasification), in this section 
we explore traits relevant to lignocellulose biochemical con¬ 
version processes which convert biomass into fermentable 
products through enzymatic hydrolysis (saccharification) 
[11]. The bioenergy grass feedstock traits that underlie con¬ 
version efficiency are being elucidated opening the door to 
genetic enhancement from existing feedstock. 

Cellulase inhibition 

Cellulase enzyme cost is estimated to be ~50% of the 
total cost of the commercial hydrolysis process [14]. In 
addition, the enzymatic hydrolysis of lignocellulosic ma¬ 
terial experiences a reduction in activity over time. This 
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reduction in activity has been attributed to hydrolysis in¬ 
hibition (end product and other [15-18]), reduction in 
easily accessible cellulose (e.g. crystalline vs. amorphous 
cellulose [19]), and reduction in efficient enzyme adsorp¬ 
tion. Increasing enzyme accessibility to cellulose has 
been shown to play a crucial role in improving enzym¬ 
atic hydrolysis [20-24]. Finding efficient means to in¬ 
crease enzymatic hydrolysis is vital to the success of 
lignocellulosic bioenergy production. 

Chemical inhibition of cellulase reduces the total 
amount of reducing sugar produced for fermentation. 
High concentrations of end-products have been known 
to cause a reduction in cellulase activity. For example, 
while cellobiose is often a product of cellulases, it has 
also been shown to be a significant inhibitor of the activ¬ 
ity of some cellulase [25]. This inhibition has been 
shown to be reduced by supplementing [S-glucosidase to 
cellulase solutions lacking sufficient (S-glucosidase activ¬ 
ity [26]. End-product inhibition by glucose has been 
shown to inhibit late stage hydrolysis rates [27-29]. In 
addition to cellobiose, glucose has been shown to inhibit 
cellulase activity in cellulases derived from Trichoderma 
species [30,31]. However, inhibitory effects of glucose do 
not appear to affect Aspergillus species to the same de¬ 
gree [32-35]. This often leads to Trichoderma cellulases 
being supplemented with Aspergillus [S-glucosidase to in¬ 
crease saccharification efficiency on an industrial level 
[36,37]. Additionally, xylose and arabinose, which are 
produced during the hydrolysis of hemicellulose, have 
been shown to inhibit cellulase activity [18,38]. Substrate 
inhibition of cellulases has led to simultaneous sacchari¬ 
fication and fermentation (SSF) systems becoming popu¬ 
lar, alleviating end-product inhibition. 

In addition to end-product inhibition, metal ions have 
been shown to be inhibitory to cellulase hydrolysis reac¬ 
tions. It is suggested that the Fe(II) and Cu(II) oxidize the 
reducing ends of cellulose, inhibiting the exo-cellulolytic 
activity of cellulase [39-43]. However, not all metal ions 
cause an inhibitory effect on hydrolysis. Kim et al. found 
that while Hg ++ , Cu ++ and Pb ++ caused decrease in the 
production of total reducing sugars, other metal ions 
(Mn ++ , Ba ++ , and Ca ++ ) caused an increase in the total 
production of reducing sugars, indicating a stimulating 
effect on hydrolysis [44]. Two of these ions (Hg ++ and 
Mn ++ ) were shown to play a direct role in enzyme ad¬ 
sorption. Additionally, Mg ++ was shown to stimulate the 
activity of glucanase from Bacillus cellulyticus [45]. The 
activity of cellulase produced from Chaetomium thermo- 
philum was shown to be increased by Na + , K + and Ca ++ , 
but inhibited by Hg ++ , Zn ++ , Ag + , Mn ++ , Ba ++ , Fe ++ , Cu ++ , 
and Mg ++ [46]. This indicates that metal ions play an im¬ 
portant role in enzyme efficacy during hydrolysis, and that 
knowledge of the correct ratio of metal ions is essential to 
increasing hydrolysis activity. 
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Phenolic compounds are also known to inhibit celluloly¬ 
tic enzymes. These phenolics are often found in lignin, 
and are released (as well as their derivatives) during pre¬ 
treatment processes. The types of phenolics present 
depends largely on the composition of biomass in combin¬ 
ation with the type of pretreatment method employed 
[47-49]. A variety of released phenolic compounds have 
been identified during chemical pretreatment of lignocel- 
lulosic biomass [50-52], which have been shown to inhibit 
conversion of carbohydrates into ethanol as well as to in¬ 
hibit cellulase activity [38,53-56]. Cellulases, hemicellu- 
lases, and [3-glucosidase enzymes have all been shown to 
be inhibited by these phenolic compounds [54,56-59]. The 
magnitude of inhibition may specific to enzyme source as 
Aspergillus niger (3-glucosidase was shown to be more re¬ 
silient to phenolic inhibition when compared to Tricho- 
derma reesei [3-glucosidase, requiring a 4x higher 
concentration for inhibition [38]. Introduction of tannic 
acid degrading enzymes (Tannases) has been shown to in¬ 
crease enzymatic hydrolysis, likely by reducing tannic 
acid’s propensity to interact and inhibit cellulase [60]. 
Additionally, polyethylene glycol has been shown to re¬ 
duce inhibition of cellulase by tannins [61] by breaking up 
tannin-protein complexes. Tween 80 and PEG-4000 have 
been shown to prevent inhibition of (3-glucosidase by re¬ 
ducing the tannins ability to bind the cellulase protein 
[61,62]. Finding additional methods to reduce the role of 
inhibitors in enzymatic hydrolysis is an important factor 
in increasing hydrolysis efficiency and profitability. Redu¬ 
cing the process-specific release of cellulase inhibitors 
through tailored feedstock genotypes is an attractive ap¬ 
proach to enhancing enzymatic hydrolysis. 

Cellulose accessibility 

Lignocellulosic material is a complex matrix of cellulose, 
hemicellulose and lignin [63,64], In un-pretreated ligno¬ 
cellulosic samples, only a fraction of the cellulose is ac¬ 
cessible to enzymatic hydrolysis, while the rest of the 
exposed biomass is lignin and hemicellulose. In order to 
increase access to cellulose, pretreatment methods are 
employed that aim to remove the lignin and hemicellulose 
fraction and leave cellulose available for hydrolysis. In 
addition, phenolic compounds such as ferulate play an im¬ 
portant role in crosslinking lignin within the cell wall (see 
reviews [65-70]) and have the potential to be genetically 
modified to aid in the removal of specific cell wall compo¬ 
nents. There are many grass-specific features of the cell 
wall which have the potential to be exploited for increased 
bioenergy production [71]. For example, the composition 
of grass lignin is composed of syringyl (S), guaiacyl (G) 
and p- hydroxyphenyl (H) subunits that when present in 
varying ratios may lead to increased digestibility [68]. 
However, debate remains involving the role of lignin subu¬ 
nits in conversion efficiency [72-75]. 
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Removal of structural components such as hemicellu¬ 
lose via dilute sulfuric acid pretreatment has been shown 
to increase accessibility to cellulose for enzymatic hydroly¬ 
sis [76]. Removal of hemicellulose has been reported to in¬ 
crease pore volume and surface area further increasing the 
accessibility of cellulase [21]. Drying lignocellulosic sub¬ 
strates after chemical pretreatment results in the collapse 
of the newly formed pores, resulting in a decrease in en¬ 
zymatic hydrolysis rate through reduction in available cel¬ 
lulose for hydrolysis [24,77]. Another pretreatment 
strategy which uses ionic liquids on switchgrass was 
shown to increase the porosity by over 30 fold, greatly in¬ 
creasing the accessibility of cellulose to enzymatic diges¬ 
tion [78]. This indicates that pore size and volume may 
play a significant role in increasing the rate of enzymatic 
hydrolysis. The identification of bioenergy grass feedstock 
genotypes that respond favorably to chemical pretreat¬ 
ment can increase end-product yield. 

Fignin has been shown to play a large role in enzymatic 
conversion efficiency [79]. InMiscanthus sinesens, Yoshida 
et al. showed that removal of lignin via sodium chlorite 
resulted in an increase in enzymatic hydrolysis rate [80]. 
Yoshida et al. further demonstrated that the addition of 
hemicellulases resulted in an increase in overall hydrolysis 
rate, indicating that hemicellulose is an additional inhibi¬ 
tor of cellulose hydrolysis rates [80]. Zhao et al. also 
reported an increase in the enzymatic hydrolysis rate of 
sugarcane bagasse after the removal of lignin with parace- 
tic acid [81]. Dissolution of lignocellulosic material with 
ionic liquid has been shown to increase enzymatic hy¬ 
drolysis rates in wheat straw [82], corn stover [83] and 
switchgrass [78]. Kimon et al. showed that disolving ligno¬ 
cellulosic material in ionic liquid at temperatures >150°C 
has a large effect on saccharification of sugarcane bagasse 
[84], Additionally ionic liquid pretreatment of switchgrass 
was shown to increase hydrolysis kinetics by over 39 fold 
over untreated switchgrass [78]. Ionic liquid pretreatment 
has also been shown to break inter and intra-molecular 
hydrogen bonding between cellulose strands causing an 
increase in the removal of amorphous components (lignin, 
hemicellulose) as well as an increase in surface area for 
cellulase adsorption [85]. These methods were both shown 
to superiorly increase hydrolysis rates when compared to 
traditional methods (dilute acid and ammonium hydrox¬ 
ide, respectivley). Singh et al. reported that ionic liquid 
caused disruption of the inter and intra-molecular hydro¬ 
gen bonding between lignin and cellulose which initially 
causes swelling of the plant cell wall followed by complete 
dissolution [86]. Organosolv pretreatment of switchgrass 
was shown to preferentially remove both lignin and hemi- 
celluloses, leaving a larger cellulose fraction which resulted 
in an increase in the enzymatic hydrolysis rate [87]. Rollin 
et al. showed that treating switchgrass with organozolv 
resulted in a similar increase in the surface area causing 
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increased cellulase adsorption [88]. It is important to note 
that the promising field of ionic liquid pretreatment it still 
in its infancy. The current high costs of ionic liquid pre¬ 
treatment limits its application to industrial scale-up, and 
like enzyme costs, must be reduced in order to be eco¬ 
nomically feasible on a large scale. 

In addition to chemical pretreatment, naturally occur¬ 
ring mutations found in grasses have been shown to in¬ 
crease the rate of enzymatic hydrolysis via reductions in 
lignin. Brown midrib ( bmr) is a phenotype found in 
grasses (maize [89], sorghum [90] and pearl millet [91]) 
that is associated with a mutation in genes involved in 
monolignol biosynthesis. These mutations have been 
shown to lead to a reduction in the total lignin content 
of the plant [92,93]. The brown colored midrib of the 
leaf has been shown to associate with a mutation in 
cinnamyl-alcohol dehydrogenase (CAD), which causes 
incorporation of cinnamyl-aldehydes in place of 
cinnamyl-alcohol during lignin biosynthesis [72,94,95]. 
Additional bmr varieties have been shown to have 
mutation in caffeic acid O-methyltransferase (COMT) 
[96-98]. However, both CAD and COMT mutants only 
exhibit reduced monolignol biosynthesis as opposed to 
total cessation of monolignol biosynthesis, indicating that 
other CAD and COMT genes may individually override 
complete cessation of monolignol biosynthesis. Theerar- 
attananoon et al. found that a bmr mutant sorghum var¬ 
iety had less total lignin than forage, grain, sweet and 
photoperiod sensitive sorghum varieties [99]. In addition 
to lower lignin contents, bmr varieties have been shown 
to have increased susceptibility to chemical pretreat¬ 
ments. In sorghum, it was found that bmr mutants were 
more susceptible to alkaline pretreatment than non -bmr 
varieties [100]. Corredor et al. demonstrated that bmr 
sorghum varieties had a 79% hexose yield after enzymatic 
hydrolysis, which was higher than two non-bmr varieties 
which yielded 43% and 48% [101]. Additionally, sorghum 
varieties that contain both the mutations in COMT and 
CAD have been shown to have lower lignin contents 
than either mutant individually [102]. It is possible that 
there are additional genes and alleles leading to lowered 
lignin or other traits associated with higher hydrolysis 
rates. The identification of new as well as known lignifi- 
cation genes could lead to novel breeding programs 
where stacking of genes could result in intrinsic increases 
in lignocellulosic digestibility. 

It is important to note that some maize bmr varieties 
have been characterized as being susceptible to lodging 
[103]. However, these susceptibilities were not seen in 
other maize studies which may be attributed to differ¬ 
ences in genetic background [104,105]. This suggests 
that selecting an optimal genotype for the bmr mutation 
may be important in creating a superior feedstock. In 
addition to lodging, bmr mutants have been labeled as 
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more susceptible to disease and pathogen attack due to 
reduction in the lignin barrier. However, accumulation 
of lignin precursors has been shown to prevent the pro¬ 
duction of virulence factors as well as limit fungal patho¬ 
gens [106-108]. It has also been widely reported that 
bmr varieties experience a decrease in yield associated 
with reduced lignin content. This has been seen in maize 
[104,109,110] and sorghum [111,112] bmr varieties. 
However, sorghum bmr hybrid varieties have been cre¬ 
ated that experience yields similar to wild type [113], 
suggesting that the genetic background of the mutant 
variety is important in overcoming yield reduction. 

Transgenic approaches have already shown potential 
to increase saccharification efficiency in grasses. Overex¬ 
pression of miR156, which suppresses SQUAMOSA 
PROMOTER BINDING PROTEIN LIKE (SPL) genes, 
in switchgrass caused an increase in overall biomass 
accumulation coupled with an increase in conversion 
efficiency of 24.2% - 155.5% in non-pretreated ligno¬ 
cellulosic material and between 40.7%-72.3% increase 
in acid pretreated samples [114], In addition, moderate 
overexpression of miR156 caused switchgrass plants not 
to flower, reducing the possibility of transgenic gene es¬ 
cape. However, it should be noted that overexpression of 
miR156 caused dwarfism in both rice [115] and maize 
[116], which greatly reduces the plants value as a bio¬ 
energy feedstock. In addition, overexpression of R3R3- 
MYB4 transcription factors has been shown to repress 
lignin biosynthesis in several species [117-120]. In 
switchgrass, overexpression of PvMYB4 resulted in a 
three-fold increase in hydrolysis efficiency [121]. How¬ 
ever, like the overexpression of miR156, these plants 
experienced a smaller stature than control varieties, lim¬ 
iting the gains made from increased hydrolysis efficiency. 
Clearly, the identification of active small RNA regulatory 
genes that do not affect biomass yield using genomic 
approaches is an exciting avenue towards bioenergy grass 
improvement. 

Crystallinity index 

Crystallinity index (Cl) is a parameter that is used to de¬ 
termine the relative amount of crystalline cellulose in 
lignocellulosic material. Increased crystallinity of cellu¬ 
lose causes reduction in cellulase binding to cellulose 
due to reduced surface area. Conversely, increased 
amorphous cellulose causes an increase in the surface 
area, causing an increase in hydrolysis rates. Cl has been 
measured using x-ray diffraction [122], solid-state 13 C 
NMR [123], infrared spectroscopy (IR) [124-126] and 
Raman spectroscopy [127]. Cl has been shown to be 
correlated with enzymatic hydrolysis of lignocellulosic 
material. In Sorghum bicolor, Cl has been shown to be 
negatively correlated with hydrolysis rate in whole plant 
tissue [128]. It has also been shown in sorghum as well 
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as maize that stem has a higher crystalline content than 
leaf tissue [129]. Furthermore, sorghum bmr mutants as 
well as wild type varieties experience an increase in Cl 
after pretreatment with 1M NaOH. This observation is 
attributed to the removal of the amorphous component 
of the lignocellulosic biomass, leaving a larger fraction of 
crystalline material. However, it was also observed that 
an increase in the concentration of NaOH to 5M 
showed a decrease in Cl, which was attributed to the 
crystal structure change and cellulose amorphization 
[100]. A similar trend was seen in dilute acid pretreat¬ 
ment of five sorghum varieties. Dilute acid pretreatment 
of sorghum at 140°C resulted in an increase in Cl, how¬ 
ever increasing the temperature during pretreatment to 
165°C resulted in a decrease in the Cl of 4 of 5 sorghum 
varieties [99]. This change in cellulose composition after 
pretreatment has been previously demonstrated in vari¬ 
ous industrial cellulose samples pretreated with NaOH 
[130,131]. Sugarcane bagasse was also shown to experi¬ 
ence an increase in crystallinity after pretreatment with 
peracetic acid, which was attributed to a decrease in the 
amorphous component of the plant biomass [81]. 
Corredor et al. demonstrated dilute acid pretreatment of 
bmr and non-bmr sorghum varieties were shown to in¬ 
crease Cl after pretreatment [101]. In addition, hydroly¬ 
sis of the same samples resulted in a reduction in CL 
Liu et al. found that like sorghum, acid pretreatment of 
maize biomass causes an increase in CI. However, the 
harshest pretreatment conditions cause a decrease in 
crystallinity, likely due to disruption of the cellulose 
crystalline structure [132]. This trend was confirmed by 
Mittal et al., who also demonstrated that crystallinity of 
corn stover depends on specific conditions of alkali pre¬ 
treatment. Additionally, Bari et al. demonstrated that 
maize husks experienced an increase in CI after both 
acid (H 2 S0 4 ) and alkali (NaOH) pretreatment processes 
[133]. It should be noted that previous studies have 
demonstrated that the cellulose binding domain of cellu- 
lases disrupt cellulose crystalline structure and causes a 
decrease in CI [134,135]. This suggests that cellulose 
binding plays a role in conjunction with a decrease in 
cellulose content in the reduction in crystallinity index 
during enzymatic hydrolysis. Therefore, finding favorable 
genetic variation in endogenous and pretreated CI is a 
logical approach to improve hydrolysis yield [128]. 

Not all pretreatment strategies lead to an increase in 
CI. Pretreatment strategies that are particularly harsh 
initially increase CI through removal of amorphous 
components, followed by subsequent dissolution of crys¬ 
talline cellulose. For example, Kimon et al. demonstrated 
that dissolving sugarcane lignocellulosic material with 
ionic liquids at temperatures >150°C causes a reduction 
in the cellulose CI and a large increase in glucan sac¬ 
charification, while temperatures <150°C has a small 


effect on crystallinity, which was associated with a 
slower initial rate of glucan saccharification [84]. There¬ 
fore, a screen for bioenergy grass genotypes that respond 
to harsh pretreatments in a favorable way could identify 
better feedstocks. 

CI has been shown to differ between plant species, as 
well as different varieties within a species. When com¬ 
pared to different sorghum varieties, maize has been 
shown to have a higher CI [99]. Vandenbrink et al. 
demonstrated that CI differed between 18 different var¬ 
ieties of Sorghum bicolor, and these differences in CI were 
associated with hydrolysis rate [128]. Harris et al. found 
that crystallinity index differed among a large variety of 
plants which included sweet sorghum, switchgrass, giant 
Miscanthus, sweet Miscanthus, flame Miscanthus, gama- 
grass, big bluestem and Arabidopsis [136]. However, it 
must be pointed out that many of these species were only 
tested on a small number of varieties, which may not give 
an accurate depiction of CI in a diverse population where 
one genotype is one data point. These studies provide evi¬ 
dence that due to differences in CI between species and 
variety, there may be a significant genetic component that 
is associated with the trait. 

There is much debate about the changes in crystallin¬ 
ity experienced during enzymatic hydrolysis of lignocel¬ 
lulosic materials. Various studies have demonstrated that 
amorphous cellulose components are hydrolyzed prefer¬ 
entially to crystalline components, resulting in an in¬ 
crease in crystallinity as enzymatic hydrolysis occurs 
[80,137,138]. However, various other studies have 
demonstrated that hydrolysis results in little change to 
crystallinity over the course of enzymatic hydrolysis 
[139,140], which was attributed to the synergistic action 
of endo and exo-glucanase activities [87,141], However, 
it should be noted that studies have shown that the cel¬ 
lulose binding domain of multiple cellulases disrupt the 
supermolecular structure of cellulose, resulting in a de¬ 
crease in CI [134,135]. This creates a difficult task in 
measuring changes in CI during enzymatic hydrolysis. 

Enzyme adsorption 

Non-specific cellulase adsorption to biomass plays a cru¬ 
cial role in determining the effectiveness of enzymatic 
hydrolysis. Due to the high cost of enzymes for commer¬ 
cial scale hydrolysis, adsorption and desorption rates in 
specific genotypes should be pre-determined. After hy¬ 
drolysis, enzymes can either remain adsorbed to the sub¬ 
strate or unbound in the hydrolysate [142], Cellulase 
adsorption depends largely on the concentration of the 
protein, as well as cellulase concentration and available 
surface area [143]. Initial protein adsorption has been 
shown to correlate with the initial rate of cellulose hy¬ 
drolysis [19,144]. Multiple studies have shown that total 
enzyme adsorption is directly related to hydrolysis rate 
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and yield [145-148]. Strong correlations between avail¬ 
able surface area and rate of hydrolysis have also been 
observed [23,149,150]. This increase in hydrolysis rate 
can be attributed to increased adsorption. Nutor et al. 
found that initial protein adsorption occurs quickly, 
reaching a maximum in 30 minutes, followed by 55-75% 
desorption [151]. Increasing the amount of enzyme 
adsorbed onto cellulose substrate is a potential avenue 
to increase hydrolysis rates, and it remains untested if 
specific cellulases are better adsorbed in specific bioe¬ 
nergy grass feedstock varieties. 

Cellulase adsorption to lignin reduces cellulase activity 
by sequestering the enzyme away from its substrate. 
After the completion of hydrolysis, non-specific binding 
to lignin that has been freed during hydrolysis has been 
shown to occur, where 30-60% remains bound to the lig¬ 
nin fraction [152,153]. This non-specific binding has 
been shown to be only partly reversible [154]. Adsorp¬ 
tion of cellulases to isolated lignin has been reported, 
supporting claims that non-specific binding occurs to 
the lignin fraction during hydrolysis [155,156]. Any cel¬ 
lulase bound to lignin is not available to hydrolyze cellu¬ 
lose, limiting overall efficiency. Hydrolysis rates of 
cellulose has been shown to be correlated with the tight¬ 
ness and affinity of adsorption [157]. Removal of lignin 
does not only reduce the steric hindrance to the enzyme, 
but also reduces the lignin available for non-specific 
binding [158,159]. 

Protein adsorption interactions are usually non- 
covalent (hydrogen bonding, electrostatic or hydro- 
phobic interactions [160]). Surface characteristics of 
lignocellulosic material are thought to play a major role 
in cellulase adsorption where the high surface area 
hydrophobicity results in increased adsorption. Cellulases 
have been shown to have hydrophobic amino acids 
exposed on the outside of the protein, which interact 
with the hydrophobic surface of cellulase [161]. The af¬ 
finity of cellulase for hydrophobic substrates may explain 
non-specific binding to lignin which is highly hydropho¬ 
bic. In addition to this, metal ions have been shown to in¬ 
crease (in the case of Mn ++ ) and decrease (in the case of 
Hg ++ ) the adsorption affinity and tightness of binding to 
the hydrophobic surface of cellulose [44], 

In order to drive down the cost of enzymatic hydroly¬ 
sis, strategies to recycle cellulases are being developed 
[141,162-165]. Enzymes can be recovered from either 
bound substrate or from the liquid hydrolysate that 
remains after the first round of hydrolysis. Recovery of 
the enzyme from bound substrate can be achieved 
through washing with surfactant (such as Tween 20 
[166]) or through recovery of the solid substrate in 
which the cellulase remains bound [162]. Use of cellu¬ 
lase recovered from lignocellulosic residue for subse¬ 
quent rounds of hydrolysis have been shown to 


Page 6 of 20 


experience reduced activity, which has been attributed 
to accumulation of bound lignin after each successive 
round of hydrolysis [154,163]. Recovery of enzyme from 
the liquid hydrolysate has been traditionally been done 
through ultracentrifugation techniques [142,167,168]. 
While this method has been proven effective, it would 
be costly to scale up to industrial magnitudes. A more 
effective method may be to exploit cellulase affinity for 
cellulose, in which the addition of cellulose to cellulase- 
containing hydrolysate results in re-adsorption onto the 
fresh cellulose substrate [163,169,170]. Tu et al found 
that addition of fresh substrate to hydrolysate recovered 
~50% of cellulases [171]. Additionally, bound enzyme 
was shown to be able to be recovered by contacting the 
bound substrate with fresh substrate [172]. However, se¬ 
quential hydrolysis with recovered enzyme results in de¬ 
creasing hydrolysis rates due to non-specific binding. 
Additionally it must be noted that (3-glucosidase does 
not bind to cellulose substrate, and must be added at the 
beginning of each round of hydrolysis in order to pre¬ 
vent the buildup of cellobiose and the resulting substrate 
inhibition [171]. It is therefore necessary to develop 
techniques that are able to efficiently desorb cellulase 
from bound substrate. Deshpande et al. found that 90% 
of cellulase was recoverable from steam-exploded wheat 
straw [152]. Jackson et al. found that using a surfactant 
such as Tween 80 resulted in a recover of 6 - 77%, de¬ 
pending on concentration of Tween 80 and pH of the 
solution [166]. Additionally, Jackson et al. revealed that 
the highest protein recovery does not necessarily dictate 
the highest activity recovery, and that alkali conditions 
may be responsible for deactivation of the enzyme. Otter 
et al. demonstrated that Tween 80 and Triton X were 
able to desorb 65-68% of bound cellulase under alkaline 
conditions [173]. Qi et al. demonstrated that enzyme re¬ 
cycling of alkali and dilute-acid wheat straw was com¬ 
parable when using ultracentrifugation and additional 
substrate techniques [174], However, the additional sub¬ 
strate technique requires addition of [3-glucosidase after 
each round of hydrolysis, whereas ultracentrifugation 
does not. Finally, there was a noticeable difference in en¬ 
zyme recovery between dilute-acid and alkali pretreated 
samples, where alkali pretreated samples were able to 
desorb a larger amount of cellulase. While this discus¬ 
sion is focused on the putative industrial processes, it 
may be that specific feedstock varieties naturally exhibit 
lower adsorption rates that would further enhance the 
engineering endeavors. 

In order for bioenergy to become a sustainable alter¬ 
native to traditional fossil-fuel based transportation 
fuels, significant improvements to current enzymatic hy¬ 
drolysis methods must be made. Reduced enzyme activ¬ 
ity has been shown to be related to end-product 
inhibition, production of phenolic compounds from 
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lignin, as well as metal ion inhibition. Additionally the 
reduction in easily accessible cellulose through steric 
hindrance and high crystalline to amorphous cellulose 
levels cause a reduction in cellulose available for enzym¬ 
atic hydrolysis. Non-specific binding of cellulases to 
solubilized lignin has also been associated with reduced 
hydrolysis rates. Finally, adsorption has been shown to 
be correlated with the initial rate of hydrolysis, while en¬ 
zyme desorption is essential for enzyme recycling and 
reducing the cost of enzymes in bioenergy production. 
While these process components are being examined at 
the engineering level, a simple screen of existing bioe¬ 
nergy grass varieties could identify genotypes with a fa¬ 
vorable trait baseline making the process engineering 
task less difficult. 

Bioenergy grass genetic mapping resources 

There are tens of thousands of bioenergy grass geno¬ 
types in seed banks that have yet to be screened for fa¬ 
vorable bioenergy traits. In fact, many traits that have 
been shown to deeply impact bioconversion yields have 
only been tested in a handful of genotypes. Surely, there 
are a multitude of relevant traits yet to be discovered. 
Therefore, we believe that genetic improvement is often 
premature until all screening options have been 
exhausted. With this caveat, genetic improvement in 
bioenergy grass feedstock can be achieved through 
transgenic manipulation or plant breeding programs. For 
example, centuries of selection have led to crops that 
provide high grain yields ideal for food production 
[13,175]. Many “elite” cultivars are dwarf varieties that 
allocate photosynthate towards larger grain yields as 
opposed to high cellulosic biomass. In grasses, the trend 
towards reduced lignocellulosic biomass could be rapidly 
reversed as genetic loci for plant height are few and well 
characterized [176-178]. In addition, the bioenergy traits 
discussed above can be genetically mapped to genomes, 
DNA markers associated with the trait developed, and 
alleles sorted into elite and novel cultivars. Once rele¬ 
vant DNA markers are identified, these traits can be 
selected for in breeding programs using marker assisted 
selection (MAS; [179]) or genome selection (GS; [180]) 
techniques. If the causal gene is identified, it can be 
introduced transgenically [181] to create elite bioenergy 
feedstock varieties. 

In this section, we discuss the extensive genetic tools 
available for mapping traits in the genomes of bioenergy 
grasses as well as examples of previously mapped bioe¬ 
nergy traits. 

Genetic mapping techniques available for bioenergy 
grasses include mapping Quantitative Trait Loci (QTLs) 
through linkage mapping in biparental populations 
[182], association mapping in a genetically diverse popu¬ 
lation [183], and nested association mapping (NAM) 
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[184,185]. QTL mapping requires relatively sparse mar¬ 
ker coverage but identifies broad chromosomal regions 
associated with a trait of interest [182], Association map¬ 
ping analysis often requires prior knowledge of genes of 
interest or a full genome scan with high marker coverage 
to be successful [186]. Nested association mapping 
(NAM) populations exploit the benefits of both QTL and 
association mapping approaches [184,185]. It should be 
noted that genetic population structure can cause con¬ 
founding correlation between markers and phenotypes 
within subpopulations [187,188]. The existence of distinct 
subpopulations can cause bias in the estimation of allelic 
effects and errors in QTL detection [189]. Thus, it is crit¬ 
ical to generate panels that are genetically diverse and 
where population structure is clarified and corrected 
prior to genotype-phenotype associations [190]. All three 
genetic resources exist for diploid maize and sorghum 
bioenergy grasses and have been successful in mapping 
traits for years (see examples below). These approaches 
are more difficult in complex polyploids such as switch- 
grass, Miscanthus, and sugarcane, but there has been suc¬ 
cess in QTL mapping for these species (see examples 
below). 

Quantitative trait loci 

Genetically defined mapping populations are a useful re¬ 
source for locating DNA markers and mapping genes 
associated with desirable bioenergy traits. In these popu¬ 
lations, quantitative trait loci (QTLs), intervals in the 
genome where DNA markers show a non-random asso¬ 
ciation with a quantitative trait, can be identified [191], 
and the causal gene can possibly be mapped, albeit with 
difficulty (but see below). DNA markers associated with 
bioenergy QTLs can be used to breed superior varieties 
without extensive phenotyping [179] that contain a col¬ 
lection of genes desirable in a bioenergy feedstock. A 
key advantage of QTL mapping is that polymorphic 
DNA markers can be easily developed without a refer¬ 
ence genome and they do not need be at high density 
across the genome. 

In the diploid species sorghum, QTLs have been identi¬ 
fied for many potentially advantageous genes valuable to 
biofuel production. QTLs related to leaf size including 
leaf width and leaf length [192] as well as leaf yield and 
composition [193] have been identified. Stem morpho¬ 
logical traits such as height [178,193-203], diameter [192] 
and tillering characteristics [191,193,195,202] as well as 
stem composition and sugar content [193,201] have been 
associated with QTLs in sorghum. In addition, QTLs for 
flowering time or maturity have been shown to increase 
overall biomass by increasing the period of plant growth 
[178,194-198,201-205]. QTLs have also been analyzed 
for kernel weight [191,194,195,199,200,206,207] as well as 
grain composition [200,206,208,209]. In addition, QTLs for 
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post-harvest regrowth (ratooning) [191,193] may have the 
potential to increase total biomass yield producing add¬ 
itional biomass post-harvest. A recent study has mapped 
bioenergy QTLs, including biomass and stem sugar con¬ 
tent, in a cross between a grain and sweet sorghum, [210]. 
The DNA markers identified in these studies can be used 
in breeding programs and demonstrate that markers for 
novel bioenergy traits such as the traits described above 
can easily be generated in existing or novel QTL mapping 
populations. 

In maize, extensive research into QTLs of agronomic 
traits has been conducted. QTLs for forage quality and 
biomass composition have been comprehensively studied 
[211-219] and may have the potential to increase conver¬ 
sion efficiency. Also, because corn is a major food crop, 
thorough investigation of mapping populations has 
been conducted leading to the identification of a 
multitude of grain yield QTLs [220-233] which may 
lead to larger starch-derived ethanol yields. Addition¬ 
ally, QTLs for biomass related traits including both 
plant height [177,234-242] and plant maturity/flowering 
time [234-240,243,244] have been characterized, which 
could lead to increases in overall biomass yield. Leaf bio¬ 
mass characteristic QTLs [236,245-247] have also been 
identified which can lead to increased biomass as well as 
increased crop density resulting in greater yields. As with 
sorghum QTL studies, the maize mapping populations 
used in these studies can be used to map additional bioe¬ 
nergy traits and these DNA markers can be used in selec¬ 
tion programs. 

Complex polyploids such as Miscanthus sinensis, 
switchgrass, and sugarcane have had substantially fewer 
QTLs identified relative to the diploid grasses: sorghum 
and maize. In Miscanthus, plant biomass including leaf 
yield, stem yield and total plant height have been identi¬ 
fied [248,249] leading to potential increases in total bio¬ 
mass. Additionally, flowering time QTLs have been 
identified which may lead to increased biomass accumu¬ 
lation [250,251]. Miscanthus also has potential as an en¬ 
ergy source for thermal conversion. This has led to the 
identification of QTLs that effect thermal conversion ef¬ 
ficiency [252,253]. To date, there have not been QTLs 
identified for the composition of Miscanthus biomass or 
forage quality, but the extant mapping populations are 
an excellent resource for mapping these traits. In sugar¬ 
cane QTLs for stem sugar content have been identified 
[254-257], but few other bioenergy QTLs have been 
identified. These representative studies demonstrate that 
QTL mapping is a realistic tool for mapping complex 
traits in polyploid species. Below we discuss how mod¬ 
ern sequencing techniques can be used to sequence large 
DNA segments underlying the QTL that becomes a 
powerful resource for identifying candidate genes even 
in complex polyploids. 
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QTL mapping in polyploid bioenergy grasses should 
improve with the development of new genomic 
resources. Recently, a high density genetic map has been 
developed for switchgrass [258], and two high resolution 
linkage map were created for Miscanthus sinensis 
[259,260]. These high-density maps open the door map¬ 
ping QTLs to other genome through comparative gen¬ 
omics. For example, the Miscanthus map studies found 
that that of the sequenced grass species, sorghum, has 
the closest syntenous relationship to Miscanthus and 
that Miscanthus sinensis is of tetraploid origin consisting 
of two sub-genomes. These genetic maps will allow 
researchers to translate genetic tools from sorghum such 
as QTL studies and a sequenced genome via synteny 
relationships, thereby expanding the toolkit available for 
Miscanthus. In addition, the high density linkage maps 
can be used for Miscanthus genome assembly as well as 
QTL studies. Known and as yet undetected QTLs are a 
valuable method to identify DNA markers, often in mul¬ 
tiple genome positions, that can be used to select for 
improved feedstock varieties before a crop development 
cycle is complete. 

Minimal progress has been made in the development 
of superior cultivars from the identification of QTL 
associated with bioenergy traits. This may be due to the 
limitations in the transferability of QTL information due 
to QTLs being specific to alleles from inbred mapping 
parents. It may be that robust QTLs detected under 
multiple genetic backgrounds will be required. However, 
MAS stacking of QTLs (pyramiding) has been successful 
in other plant species as an avenue of crop improve¬ 
ment. Zhang et al. used QTL pyramiding to increase 
downy mildew tolerance in wild lettuce ( Lactuca sal- 
igna) [261]. In another example, rice yield [262] as well 
as grain size and shape [263] have been modified 
through QTL pyramiding strategies. This suggests that 
given the ideal genetic background, genetic improve¬ 
ment of bioenergy crops through QTL pyramiding may 
be a viable way to produce superior feedstocks. 

The NAM method for mapping QTLs relies on selec¬ 
tion of a genetically diverse founding population which 
is derived from a common parent to create a large popu¬ 
lation of related progeny (often in the form of Recom¬ 
binant Inbred Lines or RILs). NAM has the benefit of 
providing high QTL mapping resolution without requir¬ 
ing high marker density within the population [264]. In 
maize, a NAM population was created consisting of one 
common parent crossed with 25 diverse parents to pro¬ 
duce 5,000 genetically distinct offspring [264]. A sor¬ 
ghum NAM population is under development [265]. 
QTLs for leaf architecture (including leaf angle, leaf 
length and leaf width) have been identified using the 
maize NAM population [185]. In addition, NAM has 
been used to identify QTLs for complex traits such as 
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resistance to northern leaf blight in maize [266]. While 
NAM incorporates high resolution QTL mapping with 
low marker coverage and high heterogeneity, it also 
requires large population size and a structured popula¬ 
tion in order to be informative. This technique also 
requires the screening of a large number of individuals, 
which makes the identification of complex phenotypes 
potentially very labor intensive. However, the NAM and 
other advanced genetic approaches are a powerful ap¬ 
proach to dissect the genetic architecture of complex 
bioenergy traits. 

While QTL studies have potential for bioenergy gene 
discovery, they also have limitations. Due to genetic het¬ 
erogeneity, QTLs may be overestimated or not detected. 
There are also a variety of problems that arise in QTL 
mapping of polyploid genomes such as sugarcane and 
Miscanthus. These include increases in the number of 
genotypes per marker or QTL due to the increased num¬ 
ber of chromosomes in the homeologous set, the dosage 
of marker and QTL in the parents and progeny are not 
obvious or observable, additional copies of a marker can 
mask recombination events, and the pairing behavior of 
chromosomes during meiosis is usually unknown [267]. 
Furthermore, low density genetic maps make it difficult to 
locate genes within a QTL region, which can contain 
thousands of genes. Dense genetic maps based upon se¬ 
quence tagged markers, as is the case for sorghum [268], 
are readily mapped to other genomes. In this way, bioe¬ 
nergy QTLs can be identified in diploid sorghum and 
mapped to complex genome bioenergy grasses for causal 
gene inference and validation. 

Association mapping (diversity) panels 

Association mapping is an alternative method for mapping 
QTLs that is based on linkage disequilibrium (LD) occur¬ 
ring from historical recombination events in genetically 
diverse populations [269,270]. Association mapping uti¬ 
lizes marker-phenotype associations to determine if cer¬ 
tain DNA markers co-segregate with a phenotype of 
interest. Association mapping generally falls into one of 
two categories: i) candidate gene association mapping, 
which looks for markers and causal variation in a subset 
of genes that are of interest for polymorphisms and ii) 
genome wide scan association mapping (GWAS), which 
scans the whole genome using dense marker sets to find 
marker associations with complex traits. Association map¬ 
ping offers multiple benefits over traditional QTL map¬ 
ping populations. QTL mapping populations suffer from 
restrictions due to limited genetic heterogeneity in that a 
QTL that is mapped in one mapping population derived 
from two genetic backgrounds and may not be applicable 
to other populations with parents derived from different 
lineages [271,272], Association mapping panels, however, 
benefit from having higher resolution of identified QTLs 
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than traditional QTL mapping methods [273]. While asso¬ 
ciation mapping requires a large diverse germplasm (di¬ 
versity panel) of individuals to map QTLs, it does not 
require generation of inbred or backcrossed populations. 

Association mapping populations have been created 
for the bioenergy crops maize [274,275], sorghum 
[176,276] and sugarcane [277]. In sorghum, association 
mapping has led to the identification of markers for 
height, flowering time, tiller number and stem sugar 
[278,279]. In maize, association mapping has led to 
the identification of markers for flowering time 
[187,280,281], kernel composition [282] as well as starch 
accumulation [283]. Fewer studies have been conducted 
in sugarcane, which has a large complex genome with 
high ploidy levels ranging from 5x - 14x [284], Wei 
et al. mapped disease resistance in 154 sugarcane culti- 
vars [277]. A key drawback to association mapping is 
that the large population size required for successful 
identification of trait markers requires that phenotyping 
the plants be done in a high-throughput manner which 
requires a large labor force or robotics. Often, this 
reduces the scope of DNA markers that can be identified 
to traits where phenotyping is less intensive. 

Reverse genetics 

In addition to the forward identification of DNA mar¬ 
kers (and genes) by mapping a bioenergy trait to a DNA 
polymorphism, reverse genetic tools exist for the identi¬ 
fication of bioenergy genes from a panel of known 
mutants. If the mutants are created in a parent with a fa¬ 
vorable bioenergy trait baseline, it is possible to map 
genes and improve feedstock at the same time. In the 
TILLING approach (Targeting Induced Local Lesions IN 
Genomes), point mutations are randomly created 
throughout the genome by treating seeds with a muta¬ 
gen (e.g. ethyl-methanesulfonate (EMS)) [285-287]. 
These plants are selfed and screened for phenotypes of 
interest. The DNA sequences from plants with mutant 
phenotypes can be compared to the non-mutagenized 
parental DNA to determine the relevant mutation. For 
example, DNA can be purified in a high throughput 
manner [288] and sequenced using high-throughput 
techniques for the discovery of rare mutations [289]. If 
the founding parent of the TILLING population has a 
sequenced genome as a reference, sequencing of select 
mutant individuals in candidate genes or whole genome 
resequencing can be done to identify specific gene muta¬ 
tions that lead to phenotypes of interest ( e.g. [290]). As 
proof of principle, a sorghum TILLING population has 
been effective in the discovery of mutations giving rise 
to the bioenergy-relevant brown mid-rib phenotype 
[291] and altered hydrogen cyanide potential [292]. 
Once the gene variant underlying a trait is identified, the 
gene can be sequenced (e.g. PCR amplicon sequencing), 
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and any DNA variants tested for association in add¬ 
itional genotypes from the source and related organisms. 

TILLING populations have been created for the bioe¬ 
nergy crops maize [293] and sorghum [294], TILLING 
has the potential to identify bioenergy traits such as 
flowering time, total biomass, grain yield, conversion ef¬ 
ficiency, etc. TILLING as a strategy for biofuel improve¬ 
ment does have its limitations. Due to the mutations 
induced by EMS being distributed randomly throughout 
the genome, the TILLING strategy can require screening 
thousands of individual lines to identify mutants in a 
trait of interest. This requirement of high-throughput 
phenotyping techniques limits the throughput of mutant 
selection gene detection. Furthermore, polyploid gen¬ 
omes present problems associated with finding recessive 
mutants due to the number of gene copies present in 
the genome. In the case of bioenergy grasses, this is 
strong rationale for first identifying a causal genetic le¬ 
sion in a diploid genome (e.g. sorghum) and then testing 
the effect of the mutation in more complex genomes 
through plant breeding or transgenics. In summary, 
advanced genetic and mutant populations are a powerful 
approach to create varieties and map genes relevant to 
bioenergy feedstock. 

Bioenergy grass genomic resources 

The crop genetic studies reviewed above have identified 
DNA markers associated with some high priority bioe¬ 
nergy related traits such as total biomass and conversion 
efficiency. These biomarkers have immediate utility in 
bioenergy grass improvement, and it is certain that the fu¬ 
ture will reveal many more biomarkers linked to known 
and novel bioenergy traits. However, the DNA biomarker 
often merely tags DNA near the gene(s) causing the favor¬ 
able phenotype. While effective in breeding, this level of 
information leaves the underlying casual biochemical 
pathways and mechanisms in the black box. If the molecu¬ 
lar mechanisms (and specific genes) underlying a trait 
were to be deciphered, then the art of plant breeding 
could be enhanced by searching for gene variants in other 
genes in the same pathway(s) as the initially described 
causal gene. Fortunately, the genome blueprints for spe¬ 
cific bioenergy crops have been deciphered in the last dec¬ 
ade. Using a reference genome assembly as a guide, it is 
now possible to associate genetically mapped biomarkers 
with nearby candidate genes and their functional activities. 
This section surveys genomic resources available for bioe¬ 
nergy grasses and discusses their utility in a genetically 
mapped trait context. 

While genome-wide measurements of gene output can 
be obtained and interpreted without a reference genome, 
a high-quality, annotated reference genome assembly pro¬ 
vides a natural scaffold to organize and interpret genetic 
and genomic analyses. In the case of bioenergy grasses, 
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three key reference genomes have been sequenced and 
annotated: maize [295], sorghum [296], and switchgrass 
(http://www.phytozome.org/panicumvirgatum.php). Once 
a genome assembly is constructed, it is annotated for se¬ 
quence features including gene models and copy number 
(gene duplications), regulatory features, heterologous 
genome alignments (synteny), and other dynamic fea¬ 
tures such as gene expression levels under different in¬ 
ternal and external cues. An excellent genome assembly 
resource for many plants, including maize, sorghum 
and switchgrass, can be found at the DOE-JGI Phyto- 
zome website [297]. 

The genome assembly sequence is a stable coordinate 
system to associate genome-mapped genetic signals (e.g. 
QTL biomarkers, trait-associated SNPs) with functional 
genomics information such nearby genes, gene expres¬ 
sion levels, and biochemical pathways. If the sequences 
of DNA biomarkers are known, one can often locate the 
approximate genome position of a genetic signal and 
find neighboring genes in a physical context. Through 
the genome browser, biomarker DNA sequences can be 
positioned using BLAT/BLAST alignment tools or pos¬ 
sibly through keyword searches. In some cases, bio¬ 
marker positions have been pre-computed such as maize 
genetic markers accessible at [298]. Neighboring gene 
models are often annotated for function, usually via 
homology mapping, and provide clues that a given gene 
could be involved in the expression of a bioenergy trait. 
Gene function annotations include conserved protein 
domains (e.g. Interpro [299], Gene Ontology (GO) terms 
[300], and biochemical pathways (e.g. KEGG; [301]) in¬ 
cluding well annotated metabolic enzymes (e.g. RiceCyc 
at Gramene [302]). These annotation terms provide 
clues into what a gene near the biomarker is doing in¬ 
cluding possible pathway involvement, an indicator of 
gene-gene interaction and complex trait mechanism. It 
should be noted that genome browsers are highly dy¬ 
namic and are constantly being updated with new infor¬ 
mation relevant to basic biology and possible bioenergy 
trait mechanisms. 

While a reference genome view of an individual organ¬ 
ism is invaluable, there are a growing number of data¬ 
bases focused on genome comparison and mapping 
function between species. This translational genomics 
approach is very important for the bioenergy grasses as 
gene function information can be discovered in a well- 
studied diploid organism such as maize, rice, and sor¬ 
ghum for which the genome is easier to analyze relative 
to complex polyploids like switchgrass, sugarcane and 
Miscanthus. Translational genomics is possible between 
bioenergy grasses because grass genomes in general have 
maintained a similar structure analogous to mammalian 
genomes since they diverged from a common ancestor 
50-70 million years ago [303]. Therefore, genomes of 
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non-bioenergy grasses including rice [304] and Brachy- 
podium [305] are also useful reference blueprints for 
grass gene function discovery and genome comparison 
[306]. Through grass genome comparison, gene function 
can be inferred in a poorly studied genome by identify¬ 
ing orthologous chromosomal segments. For example, 
the VISTA comparative genome browser (http://pipeline. 
lbl.gov; [307]) visualizes pre-computed alignments be¬ 
tween the genomes of maize and sorghum as well as 
many other plants. A rich resource for genetically mapped 
information and grass genome comparison is Gramene 
([302,308]). Finally, the Comparative Saccharinae Genom¬ 
ics Resource (CSGR; [309]) is focused specifically on the 
grasses including and related the bioenergy grasses. For a 
deep study of these resources, the reader is directed to 
relevant chapters in [310]. The macroconservation of grass 
genome structure is critical for genomic translation be¬ 
tween bioenergy grasses with complex genomes such as 
sugarcane, switchgrass and Miscanthus. It may be a long 
time before additional reliable assemblies of complex poly¬ 
ploid genomes are realized, and at this time, we suggest 
that sorghum is an ideal C4 bioenergy grass reference gen¬ 
ome due a relatively small annotated genome and close 
evolutionary proximity to other C4 bioenergy grasses. 

The genome assembly provides physical coordinates of 
known genes, and intergenome comparison explores the 
dynamic movement of genes over evolutionary time 
scales. A reference genome assembly is also a framework 
for organization dynamic gene output measurements. 
For example, bioenergy grass gene output at the RNA 
level has been measured for over a decade using 1 st 
generation genomic tools including the conversion of 
tissue and treatment specific RNA samples into cDNA 
followed by tedious cloning and sequencing. These 
Expressed Sequence Tags (ESTs) have proven invaluable 
in gene identification and can be found in databases at 
the National Center for Biotechnology Information 
(NCBI EST database) as well as the genome databases 
mentioned above. Massively parallel measurements of 
the RNA transcriptome response under multiple treat¬ 
ments and conditions have been made for bioenergy 
grasses using DNA microarrays. These experiments are 
stored into raw and processed forms at the NCBI Gene 
Expression Omnibus (GEO) database and are an excel¬ 
lent functional genomic data mining resource for the 
bioenergy grasses. For example, differences in gene 
expression in a genetically defined population can be 
associated with traits as eQTLs [311]. In addition, thou¬ 
sands of gene co-expression interactions can be mined 
from these datasets and transformed into gene 
interaction networks (see examples below). These func¬ 
tional genomics resources have been effective in under¬ 
standing the molecular function of many bioenergy grass 
genes. 
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In recent years, rapid advances in DNA sequencing tech¬ 
nology coupled with a reference genome for mapping 
sequences have resulted in multiple powerful next gener¬ 
ation genomic analytical tools [312], New sequencing tech¬ 
nologies are capable of sequencing 10 5 -10 8 DNA molecules 
in a single experiment. As opposed to measuring molecule 
levels through hybridization to microarrays, this depth of 
coverage allows for molecule counting such as RNA- 
derived cDNA (RNAseq) or genomic DNA (re-sequencing) 
fragments to such a degree that quantitative comparisons 
can be made between samples. Example applications in¬ 
clude transcriptome profiling with RNAseq [313], de novo 
transcript assembly [314], single nucleotide polymorphism 
(SNP) discovery [315], is the discovery of rare mutations in 
mutagenized (e.g. TILLING) populations [289,290], geno- 
typing by sequencing (GBS; [316]) followed by GWAS or 
GS [183]), as well as whole [317] or partial genome de novo 
genome assembly [318]. In short, emerging sequencing 
technologies provide a high resolution lens into the dy¬ 
namic biology underlying organism development. 

Ongoing and historical genetic studies of bioenergy 
traits can be the immediate beneficiaries of these new se¬ 
quencing technologies in that known gene regions can be 
sequenced and validated. For example, given the correct 
mix of resources, candidate genes and QTLs can now be 
cloned in a cost effective manner. In one scenario, a QTL 
for a relevant trait is mapped even at low marker reso¬ 
lution without a reference genome. Then, marker probes 
proximal to the QTL are used to screen a BAC library to 
identify nearby BACs. Once candidate BACSs are identi¬ 
fied, they can be pooled and cheaply sequenced as has 
been performed for melon (57 BACs; [319], the complex 
genome of barley (91 BACs; [320]), and cacao (27 BACs; 
[318]). BAC pool assemblies can be annotated for candi¬ 
date genes, used to design probes for additional BAC se¬ 
lection, and act as a reference sequence for resequencing 
applications. Of course, the process of BAC selection is 
enhanced if a physical map exists that can be used to iden¬ 
tify a BAC minimum tiling path (e.g. [318]). In the case of 
switchgrass, a physical map might resolve the polyploidy 
issue in BAC selection [321], so individual genomes can 
be separately pooled thereby reducing the probability of 
intergenome misassembly. 

Many bioenergy traits including those outlined above 
are complex in that they are controlled by multiple 
genes. By looking at a bioenergy trait (like those dis¬ 
cussed above) as a systems biology problem, it may be 
possible to identify multiple markers or causal alleles that 
can be mixed in an appropriate genetic background to 
achieve the desired effect on yield. A near complete set 
of genes is known for a growing number of grasses (e.g. 
sorghum, maize, rice), but how these genes function in 
concert is poorly understood. Fortunately, modern gen¬ 
omic tools allow for the detection of gene dependencies 


Feltus and Vandenbrink Biotechnology for Biofuels 2012, 5:80 
http://www.biotechnologyforbiofuels.eom/content/5/1/80 


in the context of a relevant biochemical pathway or 
mapped trait that can be woven into gene interaction 
networks [322]. For example, gene interaction networks 
can be constructed that represent the non-random co¬ 
expression of transcripts between genes [323,324] or the 
physical interaction of gene products at the level of pro- 
teimprotein interaction (PPI; [325,326]). Integrated gene 
sub-networks can be parsed from the overall network 
and non-randomly coupled with known biochemical 
pathways (e.g. fermentable sugar metabolism) or genetic 
signals (e.g. biomass yield) through a reference genome 
using systems biology techniques [323,327,328]. For ex¬ 
ample, gene co-expression networks have been con¬ 
structed for many plants including rice [329,330] and 
maize [323]. Co-expressed gene modules have been iden¬ 
tified in these networks, and some of the networks are 
enriched in genes that when mutated give rise to specific 
phenotypes that can be translated to the maize genome 
via conserved sub-graphs [323]. Gene regulatory net¬ 
works can also be mapped to co-expressed gene modules 
[331]. It is possible to construct additional co-expression 
networks from other bioenergy grasses using RNAseq in¬ 
put (e.g. potato network [332]). 

A systems genetics approach allows for both the pre¬ 
diction of complex polygenic genotype-phenotype inter¬ 
actions and also the ability to translate this information 
from diploid to polyploid genomes, a key asset in bioe¬ 
nergy grass improvement. We believe that gene inter¬ 
action networks will significantly reduce the candidate 
gene list underlying a bioenergy trait if the requirement 
is made that interacting genetic signal genomic positions 
(e.g. a QTL set, multiple LD blocks from a GWAS study, 
or genes mapped in mutant lines that result in the same 
phenotype) must overlap with tightly interacting genes 
from the network (e.g. [323]). It is at the intersection of 
genetics and genomics that complex bioenergy traits, 
which by definition are polygenic, can be tested as a gen¬ 
etic sub-system as opposed to breaking the system into 
individual genetic components such as a single large- 
effect QTL. 

Conclusions 

Given the uncertainties involved with long term fossil 
fuel production and increased carbon emissions affect¬ 
ing global climate, the pursuit of sustainable fuels from 
lignocellulosic biomass is important. We conclude that 
a deeper understanding of feedstock traits affecting bio¬ 
conversion such as enzyme inhibition, cellulose accessi¬ 
bility, and enzyme adsorption will ameliorate hurdles to 
bioenergy production so that it is competitive with 
current fossil fuel based transportation fuels. While 
these factors limit the efficiency of enzymatic biocon¬ 
version, they also provide a myriad of opportunities 
for end-product yield improvement through feedstock 
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genetics coupled with process engineering. Breeding 
programs that have historically focused on increased 
grain yields can be shifted to focus on traits yielding 
high-biomass, hydrolysis-efficient bioenergy crop var¬ 
ieties. It should be noted, however, that vast bioenergy 
grass seed stocks still need to be screened for high yield 
baselines prior to breeding new varieties. For example, 
future or extant varieties that contain low lignin (such as 
bmr maize, sorghum and millet) may help to reduce 
steric hindrance to hydrolytic enzymes as well as reduce 
non-specific binding and increased enzyme recovery. 
Additionally, reduced lignin content has potential to 
reduce the amount of phenolic compounds released 
during pretreatment and hydrolysis, which reduces in¬ 
hibition to cellulase. Through the coupling of DNA bio¬ 
markers to these traits, better crops can be developed 
through marker-assisted selection, and rapid advances in 
genomic and systems biology techniques should reveal 
novel biochemical mechanisms that can be engineered 
into current feedstock varieties. It is our belief that close 
collaboration between the plant breeder, systems biolo¬ 
gist, and process engineer will result in accelerated devel¬ 
opment of bioenergy grass feedstock tailored to a specific 
conversion process thereby increasing bioenergy viability 
through industrial genetics. 
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